March 5, 2018
hypotheses: a statement about what we should expect to observe if a causal claim is true. (Also could call this empirical prediction)
Causal theory may generate several hypotheses:
Presence of cause and presence of effect. Absence of cause and absence of effect.
Higher/lower levels of X (cause) appear Higher/lower levels of Y (outcome).
Claim:
Claim:
Claim:
Claim:
Maybe we've been lead astray.
X causes Y.
If X does not happen then Y would not happen
X causes Y
If claim is true, then in one case with some level of X and some level of Y:
If that case had a different level of X, then the level of Y would have been different.
We never can observe a case under the counterfactual condition: we can only observe a case under one (the factual) condition.
We want to test whether \(X\) causes higher \(Y\).
For sake of simplicity, imagine a situation with \(5\) cases.
Each observation is indexed by \(i\).
\(X_i:\) value of cause \(X\) for case \(i\). \(1\) for cause is present, \(0\) for cause is absent
\(Y_i:\) The outcome (dependent variable) for \(i\)
| i | \(Y_i\) | \(X_i\) |
|---|---|---|
| 1 | 6 | 1 |
| 2 | 2 | 0 |
| 3 | 8 | 1 |
| 4 | 4 | 0 |
| 5 | 6 | 1 |
Take average of \(Y\) for \(X = 1\) and for \(X = 0\) and take the difference:
\[\frac{6 + 8 + 6}{3} - \frac{2 + 4}{2}\]
\[ \frac{20}{3} - \frac{6}{2}\]
\[ \frac{20}{3} - \frac{9}{3} = \frac{11}{3}\]
Fundamental Problem of Causal Inference:
Counterfactual causality means that every case has a value for the outcome that it would take under the factual and counterfactual conditions. We call these potential outcomes
We want to test whether \(X\) causes higher \(Y\).
For sake of simplicity, imagine a situation with \(5\) cases.
Each observation is indexed by \(i\).
\(X_i:\) value of cause \(X\) for case \(i\). \(1\) for cause is present, \(0\) for cause is absent
\(Y_i^0:\) The outcome (dependent variable) for \(i\) when \(X_i = 0\)
\(Y_i^1:\) The outcome (dependent variable) for \(i\) when \(X_i = 1\)
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | 6 | 6 | 0 |
| 2 | 2 | 2 | 0 |
| 3 | 8 | 8 | 0 |
| 4 | 4 | 4 | 0 |
| 5 | 6 | 6 | 0 |
We found that \(Y\) was \(\frac{11}{3}\) higher when \(X = 1\)
The truth is the average difference for each case between condtion with \(X\) and condition without \(X\)
Truth is that there is no difference
fundamental problem of causal inference:
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | ? | 6 | ? |
| 2 | 2 | ? | ? |
| 3 | ? | 8 | ? |
| 4 | 4 | ? | ? |
| 5 | ? | 6 | ? |
So, we still have fundamental problem of causal inference.
Can we learn anything about causality?
if causes deterministic: cases that have exact same set of causes acting on them behave exactly the same.
Claim: \(X\) is cause of \(Y\)
\(A,B,C\) are other possible causes of \(Y\).
| Case 1 | Case 2 | |
|---|---|---|
| X | 1 | 0 |
| A | 1 | 1 |
| B | 0 | 0 |
| C | 1 | 1 |
| Y | 1 | 0 |
| Case 1 | Case 2 | |
|---|---|---|
| X | 1 | 0 |
| A | 1 | 1 |
| B | 0 | 0 |
| C | 1 | 1 |
| Y | 1 | 0 |
| Case 1 | Case 2 | |
|---|---|---|
| X | 1 | 0 |
| A | 1 | 1 |
| B | 0 | 0 |
| C | 1 | 1 |
| Y | 1 | 1 |
| Case 1 | Case 2 | |
|---|---|---|
| X | 1 | 0 |
| A | 1 | 1 |
| B | 1 | 0 |
| C | 1 | 1 |
| Y | 1 | 0 |
If causal claim that \(X \rightarrow Y\), then:
We generate empirical prediction:
If we observe two cases to be the same in all relevant respects except for value of \(X\), then we should observe that the two cases differ in the value of \(Y\)
This empirical prediction based on a causal claim is called comparative method or method of difference (via John Stuart Mill)
What are the relevant similarities? How many things need to be the same? How do we identify them?
\(X \rightarrow Y\)
but also the case that \(\lbrace W_1, W_2, \ldots, W_\infty \rbrace \rightarrow Y\)
We've been trying to find a factual case \(j\) that can be a counterfactual for the factual case \(i\): identical on all attributes except the cause \(X\).
Why? No reason to assume any case is exactly like another.
Maybe we've been doing this wrong.
Instead of finding counterfactual for each case \(i\)…
The fundamental problem of causal inference states we can't observe both \(Y_i^1\) (outcome when exposed to cause) and and \(Y_i^0\) (outcome when not exposed to cause) for a given \(i\).
But we can get around this if we look at many cases \(i \in \lbrace 1 \ldots n \rbrace\) and take the average.
We cannot know each individual causal effect: \(\tau_i = Y_i^1 - Y_i^0\)
But the average causal effect is the average of all individual causal effects:
\(ACE = \frac{1}{n}\sum\limits_i^n{\tau_i}\)
So it is also equal to this:
\(ACE = \frac{1}{n}\sum\limits_i^n (Y_i^1 - Y_i^0)\)
Which means it can be the difference between the averages of both potential outcomes:
\(ACE = \Big( \frac{1}{n}\sum\limits_i^n Y_i^1 \Big) - \Big( \frac{1}{n}\sum\limits_i^n Y_i^0 \Big)\)
But does this help us?
\(ACE = \Big( \frac{1}{n}\sum\limits_i^n Y_i^1 \Big) - \Big( \frac{1}{n}\sum\limits_i^n Y_i^0 \Big)\)
At best we can observe:
And we know from above that this can go wrong:
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | ? | 6 | ? |
| 2 | 2 | ? | ? |
| 3 | ? | 8 | ? |
| 4 | 4 | ? | ? |
| 5 | ? | 6 | ? |
Take average for \(X = 1\) and for \(X = 0\) and take the difference:
\[ \frac{6 + 8 + 6}{3} - \frac{2 + 4}{2}\]
\[ \frac{20}{3} - \frac{6}{2}\]
\[ \frac{20}{3} - \frac{9}{3} = \frac{11}{3}\]
True effect is \(0\) !
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | 6 | 6 | 0 |
| 2 | 2 | 2 | 0 |
| 3 | 8 | 8 | 0 |
| 4 | 4 | 4 | 0 |
| 5 | 6 | 6 | 0 |
Then:
But we don't know the counterfactual outcomes, so how can we ensure they are similar?
Wait…
A procedure that let us get a sample such that:
Some cases to get \(X\) and some cases to not get \(X\)
\(ACE = \frac{1}{n}\sum\limits_i^n (Y_i^1 | X_i = 1) - \frac{1}{n}\sum\limits_i^n (Y_i^0 | X_i = 0)\)
All of which is observable because it is factual
If you haven't guessed it:
This is the precise logic of a randomized experiment.
All cases have equal chance of being exposed to cause.
We need to generate emprirical predictions or hypotheses about what we will observe if the causal theory is correct
Causality implies a counterfactual.
Empirical Predictions for causal theories are that:
But we can't change the level of the independent variable for a case:
Fundamental Problem of Causal Inference: key part empirical prediction about causal theory is never observable
Fundamental Problem of Causal Inference is solvable when
"Treatment" and "Control" group have same potential outcomes
because they are two random samples from the same population
Best way to understand is to do it!
We have this table of potential outcomes:
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | 5 | 9 | 4 |
| 2 | 4 | 8 | 4 |
| 3 | 3 | 7 | 4 |
| 4 | 2 | 6 | 4 |
Could be: effect of campaign ad on campaign contributions:
| i | ($) without ad | ($) with ad | ($) with - ($) without |
|---|---|---|---|
| 1 | 5 | 9 | 4 |
| 2 | 4 | 8 | 4 |
| 3 | 3 | 7 | 4 |
| 4 | 2 | 6 | 4 |
What the true average causal effect?
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | 5 | 9 | 4 |
| 2 | 4 | 8 | 4 |
| 3 | 3 | 7 | 4 |
| 4 | 2 | 6 | 4 |
\[\frac{4 + 4 + 4 + 4}{4} = 4\]
Let's run a randomized experiment:
When we do this at random:
Write down all possible treatment groups of size \(2\) (e.g. (1,2))
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | 5 | 9 | 4 |
| 2 | 4 | 8 | 4 |
| 3 | 3 | 7 | 4 |
| 4 | 2 | 6 | 4 |
All possible treatment groups
| V1 | V2 |
|---|---|
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 3 |
| 2 | 4 |
| 3 | 4 |
Calculate the average \(Y^1\) ($ with ad) for every possible treatment (ad) group
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | 5 | 9 | 4 |
| 2 | 4 | 8 | 4 |
| 3 | 3 | 7 | 4 |
| 4 | 2 | 6 | 4 |
Calculate the average \(Y^0\) ($ with ad) for every corresponding control (no ad) group
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) |
|---|---|---|---|
| 1 | 5 | 9 | 4 |
| 2 | 4 | 8 | 4 |
| 3 | 3 | 7 | 4 |
| 4 | 2 | 6 | 4 |
| T_Group | Y_1 | Y_0 |
|---|---|---|
| 1,2 | 8.5 | 2.5 |
| 1,3 | 8.0 | 3.0 |
| 1,4 | 7.5 | 3.5 |
| 2,3 | 7.5 | 3.5 |
| 2,4 | 7.0 | 4.0 |
| 3,4 | 6.5 | 4.5 |
Calculate the average causal effect for every possible experiment (($) ad - ($) no ad) or \(Y^1 - Y^0\)
| T_Group | Y_1 | Y_0 |
|---|---|---|
| 1,2 | 8.5 | 2.5 |
| 1,3 | 8.0 | 3.0 |
| 1,4 | 7.5 | 3.5 |
| 2,3 | 7.5 | 3.5 |
| 2,4 | 7.0 | 4.0 |
| 3,4 | 6.5 | 4.5 |
| T_Group | Y_1 | Y_0 | ACE |
|---|---|---|---|
| 1,2 | 8.5 | 2.5 | 6 |
| 1,3 | 8.0 | 3.0 | 5 |
| 1,4 | 7.5 | 3.5 | 4 |
| 2,3 | 7.5 | 3.5 | 4 |
| 2,4 | 7.0 | 4.0 | 3 |
| 3,4 | 6.5 | 4.5 | 2 |
What is the average effect across all possible experiments?
| T_Group | Y_1 | Y_0 | ACE |
|---|---|---|---|
| 1,2 | 8.5 | 2.5 | 6 |
| 1,3 | 8.0 | 3.0 | 5 |
| 1,4 | 7.5 | 3.5 | 4 |
| 2,3 | 7.5 | 3.5 | 4 |
| 2,4 | 7.0 | 4.0 | 3 |
| 3,4 | 6.5 | 4.5 | 2 |
\[\frac{6 + 5 + 4 + 4 + 3 + 2}{6} = 4\]
Why does this work?
If you wanted:
Average of \(Y^1\) for treated groups (factual outcome): 7.5
Average of \(Y^1\) for control groups (counterfactual outcome): 7.5
Average of \(Y^0\) for control groups (factual outcome): 3.5
Average of \(Y^0\) for treated groups (counterfactual outcome): 3.5
Using random assignment to treatment and control:
1 and 2 always get treated, 3 and 4 never do.
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) | \(X_i\) |
|---|---|---|---|---|
| 1 | 5 | 9 | 4 | 1 |
| 2 | 4 | 8 | 4 | 1 |
| 3 | 3 | 7 | 4 | 0 |
| 4 | 2 | 6 | 4 | 0 |
\[\frac{9+8}{2} - \frac{3+2}{2} = \frac{12}{2} \neq 4\]
We can imagine cases with different potential outcomes have different attributes:
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) | \(X_i\) | \(W_i\) |
|---|---|---|---|---|---|
| 1 | 5 | 9 | 4 | 1 | 1 |
| 2 | 4 | 8 | 4 | 1 | 1 |
| 3 | 3 | 7 | 4 | 0 | 0 |
| 4 | 2 | 6 | 4 | 0 | 0 |
Presence of \(W\) is fine if: \(W\) unrelated to cause \(X\)
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) | \(X_i\) | \(W_i\) |
|---|---|---|---|---|---|
| 1 | 3 | 7 | 4 | 1 | 1 |
| 2 | 2 | 6 | 4 | 1 | 0 |
| 3 | 3 | 7 | 4 | 0 | 1 |
| 4 | 2 | 6 | 4 | 0 | 0 |
Presence of \(W\) is fine if: \(W\) unrelated to outcome \(Y\)
| i | \(Y_i^0\) | \(Y_i^1\) | \(Y_i^1 - Y_i^0\) | \(X_i\) | \(W_i\) |
|---|---|---|---|---|---|
| 1 | 3 | 7 | 4 | 1 | 1 |
| 2 | 2 | 6 | 4 | 1 | 1 |
| 3 | 3 | 7 | 4 | 0 | 0 |
| 4 | 2 | 6 | 4 | 0 | 0 |
Randomization ensures that, on average, no relationship between \(X\) and \(W\).
A study has internal validity when the causal effect of \(X\) on \(Y\) it finds is not biased (systematically incorrect).
selection bias: When cases that receive a "treatment" or "cause" have different potential outcomes from those that do not.
Consider this experiment:
non-excludability occurs when the treatment in an experiment bundles multiple different treatments
Does an apple a day keep the doctor away?
systematic measurement error: error produced when our measurement procedure obtains scores that are, on average, too high or too low.
We randomly assign some people to be shamed into voting (we remind them of their past voting record)
We measure whether people exposed to this treatment have higher self-reported voting rates
One conclusion: only do experiments.
external validity is the degree to which the causal relationship we find in a study matches the causal relationship and the context identified in a causal theory
Broockman and Kalla (2015): Does perspective taking change minds about minority groups?
Sample
Treatment
Outcome
Broockman and Kalla (2015)
Sample
Does exposure to partisan media change people's political attitudes and make them more extreme? (E.g. Fox News, Breitbart)
Sample: Randomly chosen group of US adults
Treatment:
Outcome:
Treatment:
More internal validity (unbiased calculation of causal effect) comes at the cost in external validity (relevance of study sample or cause to the causal theory)
Many relevant contexts for causal theories and interesting causes cannot or should not be manipulated at random:
Yes, but now that we know how and why experiments work, we can make these better.
If causal claim that \(X \rightarrow Y\), then:
We generate empirical prediction:
If we observe two cases to be the same in all relevant respects except for value of \(X\), then we should observe that the two cases differ in the value of \(Y\)
This empirical prediction based on a causal claim is called comparative method or method of difference (via John Stuart Mill)
What causes states to create laws designed to exclude and dominate people based on race?
Attempts to resolve conflicts within the dominant (more powerful) racial group in a society lead to the creation of a legal regime that excludes other groups.
| United States | Brazil | |
|---|---|---|
| White In-fighting | YES | NO |
| Dominant group | White Europeans | White Europeans |
| Form of Domination | Slavery | Slavery |
| Settler Colonialism | Yes | Yes |
| Era | 1860s-1960s | 1880s-1960s |
| Legal Discrimination | YES | NO |
What causes the spread of cholera?
Contaminated water causes cholera outbreaks
To us, but not in mid-19th century England
19th Century London saw repeated outbreaks of cholera, with mass death
No theory of germs, but
| Brewers | Broad St. Residents | |
|---|---|---|
| Water Source | Brewery Well/ Beer |
Pump |
| Location | Near pump | Near pump |
| Miasmas? | Same | Same |
| Timing | Same | Same |
| Cholera | No | Yes |
conjunctural causation: when effect depends on combination of causes
multiple necessary conditions: effect occurs when only in presence of more than one cause
| Brewers | Broad St. Residents | |
|---|---|---|
| Cholera Bacteria | No | Yes |
| Water Source | Brewery Well/ Beer |
Pump |
| Location | Near pump | Near pump |
| Miasmas? | Same | Same |
| Timing | Same | Same |
| Cholera | No | Yes |
If \(X \rightarrow Y\) or \(X\) causes \(Y\), then \(X\) and \(Y\) will be correlated
If \(X\) causes \(Y\), a shift in \(X\) implies a change in the value of \(Y\)
correlation: an association or relationship between the values taken by two variables (X and Y)
correlation:
correlation: also has specific mathematical definition (you don't need to memorize):
\[r = \frac{\sum_{i}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_i^n(x_i - \bar{x})^2}\sqrt{\sum_i^n (y_i - \bar{y})^2}}\]
mathematically: correlation is the degree of linear association between \(X\) and \(Y\)
negative correlation: (correlation \(< 0\)) values of \(X\) and \(Y\) move in opposite direction:
positive correlation: (correlation \(> 0\)) values of \(X\) and \(Y\) move in same direction:
It is possible to see perfect correlation but small change in \(Y\) across \(X\)
It is possible to see low correlation but large change in \(Y\) across \(X\)
It is possible to see perfect nonlinear relationship between \(X\) and \(Y\) with \(0\) correlation
weak correlation: values for \(X\) and \(Y\) do not cluster along line
strong correlation: values for \(X\) and \(Y\) cluster strongly along a line
strength of correlation unrelated to the slope of line describing \(X,Y\) relationship
Higher gun ownership rates cause higher rates of firearm homicide
If theory is correct: we expect positive correlation between gun ownership rate and firearms homicides
Test:
Why did US violent crime rates spike and then decline from the 1970s-1990s?
Widespread use of and then ban on leaded gasoline caused increase and decline in crime
Leaded gasoline \(\rightarrow\) Particulate lead in the air
\(\rightarrow\) children exposed to lead \(\rightarrow\) lead poisoning
\(\rightarrow\) education, aggression, inhibition problems
\(\rightarrow\) criminal behavior
If theory is true: positive correlation between childhood lead exposure and adult criminality
If theory is true: birth cohorts with more lead exposure have higher criminality rates in adulthood
A positive correlation betwen lead exposure and crime rates